Genetic relationship and source species identification of 58 Qi-Nan germplasms of Aquilaria species in China that easily form agarwood

Recently, Qi-Nan germplasm, the germplasm of Aquilaria species that easily forms agarwood, has been widely cultivated in Guangdong and Hainan Provinces in China. Since the morphological characteristics of Qi-Nan germplasm are similar to those of Aquilaria species and germplasm is bred by grafting, it is difficult to determine the source species of this germplasm by traditional taxonomic characteristics. In this study, we performed a DNA barcoding analysis of 58 major Qi-Nan germplasms as well as Aquilaria sinensis, A. yunnanensis, A. crassna, A. malaccensis and A. hirta with 5 primers (nuclear gene internal transcribed spacer 2 (ITS2) and the chloroplast genes matK, trnH-psbA, rbcL and trnL-trnF). This field survey in the Qi-Nan germplasm plantations in Guangdong and Hainan Provinces aimed to accurately identify the source species of Qi-Nan germplasm. According to the results, ITS2 and matK showed the most variability and the highest divergence at all genetic distances. This ITS2+matK combination, screened for with TaxonDNA analysis, showed the highest success rate in species identification of the Qi-Nan germplasm. Clustering in the phylogenetic trees constructed with Bayesian inference and maximum likelihood indicated that the Qi-Nan germplasm was most closely related to A. sinensis and more distantly related to A. yunnanensis, A. crassna, A. malaccensis and A. hirta. Therefore, this study determined that the source species of the Qi-Nan germplasm is A. sinensis.

This statement is required for submission and will appear in the published article if the submission is accepted. Please make sure it is accurate.

Unfunded studies
Enter: The author(s) received no specific funding for this work.   Recently, Qi-Nan germplasm, the germplasm of Aquilaria species that easily forms agarwood, has 16 been widely cultivated in Guangdong and Hainan Provinces in China. Since the morphological 17 characteristics of Qi-Nan germplasm are similar to those of Aquilaria species and germplasm is 18 bred by grafting, it is difficult to determine the source species of this germplasm by traditional 19 taxonomic characteristics. In this study, we performed a DNA barcoding analysis of 58 major Qi-20 Nan germplasms as well as Aquilaria sinensis, A. yunnanensis, A. crassna, A. malaccensis and A. 21 hirta with 5 primers (nuclear gene internal transcribed spacer 2 (ITS2) and the chloroplast genes 22 matK, trnH-psbA, rbcL and trnL-trnF). This field survey in the Qi-Nan germplasm plantations in 23 Guangdong and Hainan Provinces aimed to accurately identify the source species of Qi-Nan 24 germplasm. According to the results, ITS2 and matK showed the most variability and the highest 25 divergence at all genetic distances. This ITS2+matK combination, screened for with TaxonDNA 26 analysis, showed the highest success rate in species identification of the Qi-Nan germplasm. 27 Clustering in the phylogenetic trees constructed with Bayesian inference and maximum likelihood 28 indicated that the Qi-Nan germplasm was most closely related to A. sinensis and more distantly 29 related to A. yunnanensis, A. crassna, A. malaccensis and A. hirta. Therefore, this study determined 30

34
Agarwood is resinous wood produced when Aquilaria or Gyrinops species (of the Thymelaeaceae 35 family) are injured [1]. This substance is a valuable natural perfume and is used in traditional 36 Chinese medicinal to relieve pain and warm the middle to reduce vomiting [2]. Worldwide, 37 agarwood has widely featured in cultural, religious, and medicinal practices as well as other areas 38 forming predisposition of its parents. Grafting propagation can also be used to obtain germplasm 66 resources that are easy to collect and genetically stable, thereby protecting wild Aquilaria species. 67 In recent years, farmers have relied on experience to find wild, highly fragrant Aquilaria species in 3 Dianbai, Guangdong Province; they then transplant these species to their homes to serve as a Qi-69 Nan germplasm seed tree and use the branches of the seed tree as the scion for grafting propagation. 70 However, the main propagation method of farmers is grafting the branches of Qi-Nan germplasm 71 seed trees to cultivated Aquilaria species. Most of these seed trees come from Huizhou, Maoming, 72 Shenzhen, Hong Kong, Hainan Province and other places in China. Thus, Qi-Nan germplasm easily 73 forms agarwood, and the yield and extract content of its agarwood are higher than those of general 74

agarwood. 75
In recent years, Qi-Nan germplasm has been extensively cultivated in Guangdong, Guangxi, 76 and Hainan Provinces as well as other places in China due to agarwood's scarcity and value. Each 77 grower claims that the agarwood produced by their Qi-Nan germplasm has a high oil content, strong 78 fragrance, and is rapidly formed. However, the source species of many Qi-Nan germplasms remain 79 unclear. The source species has variously been proposed to be a domestic Aquilaria species, an alien 80 Aquilaria species, or even a new species. At present, many Qi-Nan germplasms are cultivated in 81 China, with substantial variability in plant size, leaf shape, stem morphology and agarwood-forming 82 performance. However, the species source and genetic relationship of these germplasms remain 83 unknown, which limits their use and protection. According to this review, the source species of Nan germplasm. In this study, DNA barcode technology was used to determine the molecular 99 identification of Qi-Nan germplasm and in subsequent analysis to explore differences in the 100 applicable fragments or combinations. 101 Thus, we used DNA barcode technology to identify the source species of Qi-Nan germplasm. 102 In this study, we selected 58 different types of Qi-Nan germplasms from popular markets and 103 included A. sinensis, A. yunnanensis, A. malaccensis, A. crassna, and A. hirta as the research objects. 104 Five DNA barcode sequences (ITS2, matK, trnH-psbA, rbcL and trnL-trnF) were compared in 105 sequence to screen for the barcode fragment or combination most suitable for identifying the source 106 species of Qi-Nan germplasm. Then, the phylogenetic trees of the Qi-Nan germplasm and the five 107 Aquilaria species were constructed with the best combination. Finally, the source species of the Qi-108 Nan germplasm was identified according to the clustering of the phylogenetic trees. 109

111
A total of 65 test materials were used in this study (S1 Table) Baozhayou, Hutoumen, Huizhouchenxiang, Zhongshannizhong, Guanxiang1, and Guanxiang2. In 121 addition, we also selected 5 Aquilaria species that were accurately identified in the previous stage 122 The total DNA of the 65 samples was extracted with a test kit from Tiangen Biotech (Beijing) Co., 136 Ltd. Common primers were used for PCR amplification of ITS2, matK, trnH-psbA, rbcL and trnL-137 trnF. Optimization and adjustments were made according to the PCR conditions reported in

147
The PCR amplification success rate and sequencing success rate were determined following Kress 148 [30]. Information on the length of amplification, variable sites, conserved sites, parsimony 7 informative sites, singleton sites and genetic distances of each fragment was collected in MEGA X. 150 The species identification success rate was evaluated according to the "best match", "best close 151 match" and "all species barcodes" (BBA) method in TaxonDNA software to identify the single 152 fragment or combination with the highest success rate. Next, phylogenetic trees were generated 153 using the Bayesian interference (BI) and maximum likelihood (ML) approaches in MrBayes 3.2.6 154 and PAUP 4b, respectively. The clusters in the phylogenetic tree constructed by the best sequence 155 combination were subsequently analyzed. Genetic distance and phylogenetic tree construction were 156 mapped in R 4.0.0. The GenBank accession numbers of all DNA fragments in this study are shown 157 in S2 Table. 158 sequencing. The success of PCR amplification and sequencing, as well as the sequence length, 164 variable sites, conserved sites, parsimony informative sites and singleton sites are shown in Table  165 2. PCR amplification of five DNA barcoding loci was successful in all samples. Except for the trnL-166 trnF sequence (which had a sequencing success rate of 0%), the other sequences achieved a 167 sequencing success rate of 100%. Moreover, the sequencing quality of trnL-trnF was repetitive, 168 which was not suitable for sequence alignment, assembly and analysis in this study. The number of 169 variable sites for each sequence was as follows: ITS2 (11) > matK (9) > rbcL (1) = trnH-psbA (1). 170

PCR Amplification and DNA Sequencing
The number of conserved sites for each sequence was as follows: matK (672) > rbcL (539) > ITS2 171

179
Of the four common DNA barcodes, ITS2 and matK had large average genetic distances, while 180 trnH-psbA had a smaller average genetic distance; the average genetic distance of rbcL was 0 (Fig  181   3). In the ITS2 region, the interspecific distance between the Qi-Nan germplasm and A. crassna was 182 0.0022±1.73E-18, the intraspecific distance among the Qi-Nan germplasms was 0, the interspecific 183 distance between the Qi-Nan germplasm and A. hirta was 0.0203±2.08E-17, the interspecific 184 distance between the Qi-Nan germplasm and A. malaccensis was 0.0157±0, the interspecific 185 distance between the Qi-Nan germplasm and A. sinensis was 0, and the interspecific distance 186 between the Qi-Nan germplasm and A. yunnanensis was 0.0090±1.04E-17. In the matK region, the 187 interspecific distance between the Qi-Nan germplasm and A. crassna was 0.0091±5.20E-18, the 188 intraspecific distance among the Qi-Nan germplasms was 0, the interspecific distance between the 189 Qi-Nan germplasm and A. hirta was 0.0091±3.47E-18, the interspecific distance between the Qi-190 Nan germplasm and A. malaccensis was 0.0012±4.34E-19, the interspecific distance between the 191 Qi-Nan germplasm and A. sinensis was 0, and the interspecific distance between the Qi-Nan 192 germplasm and A. yunnanensis was 0.0039±3.90E-18. However, all genetic distances were 0 in the 193 rbcL region. In the trnH-psbA region, the interspecific distance between the Qi-Nan germplasm and 194 A. hirta was 0.0027±9.38E-07, the interspecific distance between the Qi-Nan germplasm and A. 195 sinensis was 0.0027±9.47E-07, and the other genetic distances were 0. In addition, Wilcoxon 196 signed-rank tests further confirmed that ITS2 and matK had the highest divergence in all genetic 197 distances (Fig 3). 198 In the multilocus combinations, ITS2+matK had the highest genetic distances compared with 199 the other barcode combinations (Fig 4). In the ITS2+matK region, the interspecific distance between 9 the Qi-Nan germplasm and A. crassna was 0.0066±5.20E-18, the intraspecific distance among the 201 Qi-Nan germplasms was 0, the interspecific distance between the Qi-Nan germplasm and A. hirta 202 was 0.0132±1.73E-17, the interspecific distance between the Qi-Nan germplasm and A. malaccensis 203 was 0.0066±4.34E-18, the interspecific distance between the Qi-Nan germplasm and A. sinensis 204 was 0, and the interspecific distance between the Qi-Nan germplasm and A. yunnanensis was 205 0.0058±1.73E-18. Nan germplasm and A. crassna. GG: intraspecific distance among the Qi-Nan germplasms. GH: interspecific distance between the Qi-Nan germplasm and A. hirta. 217 GM: interspecific distance between the Qi-Nan germplasm and A. malaccensis. GS: interspecific distance between the Qi-Nan germplasm and A. sinensis. GY: 218 interspecific distance between the Qi-Nan germplasm and A. yunnanensis.) 219

220
Preliminary evaluation of the DNA sequences showed that the trnL-trnF sequence was mostly 221 repetitive and has a double peak. Therefore, only the 4 other primers (ITS2, matK, trnH-psbA and 222 rbcL) were selected for sequence screening and analysis. TaxonDNA analysis showed that the 223 species identification success rate of each fragment or combination was different (Table 3). Of the 224 single loci, ITS2 and matK were the best; the correct match rates of "best match", "best close match" 225 and "all species" for these two sequences were 93.84%. In contrast, rbcL had the lowest successful 226 identification rate (0.00%). The multifragment combinations ITS2+matK, ITS2+rbcL, matK+rbcL, 227 matK+trnH-psbA, ITS2+matK+rbcL, ITS2+matK+trnH-psbA, matK+rbcL+trnH-psbA and 228 ITS2+matK+trnH-psbA+rbcL had the highest success rate; the correct match rates of "best match", 229 "best close match" and "all species" for these combinations were 93.84%. However, only 230 ITS2+matK and ITS2+matK+rbcL had the lowest ambiguity (1.53%) under the "all species" method. 231 In addition, the success rate of these two fragment combinations was equivalent to that of three or 232 four of the other fragments. Therefore, to facilitate analysis, we selected ITS2+matk to construct the 233 phylogenetic tree. 234

237
The phylogenetic tree constructed with Bayesian inference (BI) and ITS2+matK is presented in Fig  238   5. higher success rate in PCR amplification and sequencing [36]. In addition, compared with cpDNA 276 or nuclear barcodes alone, a combination of the two better identified different species [37]. 277 The genetic distances between the 58 Qi-Nan germplasms and five Aquilaria species showed 278 genetic divergences mainly in ITS2 and matK, while trnH-psbA had few divergences (Fig 3). In 279 addition, the genetic distances of ITS2 and matK were largest between the Qi-Nan germplasm and 280 A. crassna, A. hirta, A. malaccensis, and A. yunnanensis, but there was no genetic distance between 281 the Qi-Nan germplasm and A. sinensis. Moreover, the genetic distances of trnH-psbA were largest 282 between the Qi-Nan germplasm and A. hirta and A. sinensis, but these values were lower than 0.003. 283 The rbcL fragment did not show any genetic distances between the Qi-Nan germplasm and 284 Aquilaria species. Thus, we inferred that ITS2 and matK were ideal barcodes in this study [38], that 285 variation in the trnH-psbA is low [39], and that the coding sequence of rbcL is highly conserved 286 According to the species identification rates of 4 high quality sequences analyzed by the BBA 288 method in TaxonDNA, the multifragment combinations ITS2+matK, ITS2+rbcL, matK+rbcL, 289 matK+trnH-psbA, ITS2+matK+rbcL, ITS2+matK+trnH-psbA, matK+rbcL+trnH-psbA and 290 ITS2+matK+trnH-psbA+rbcL had the highest success rate (Table 3) trnL-trnF sequence was not applicable in this study. This difference could possibly be explained by 299 differences in test materials or tree-building methods with the adopted DNA barcodes or 300 combinations. In the current study, the trnL-trnF sequence was mostly repetitive, which was not 301 conducive to conducting a cluster analysis of the phylogenetic tree. Thus, ITS2+matK was selected 302 for cluster analysis of the phylogenetic tree of Qi-Nan germplasm and Aquilaria species. Through comparison of plant morphology, the fruit of the Qi-Nan germplasm was found to be the 306 closest to the fruit of A. sinensis in shape and size (Fig 2). Previously, Aquilaria species were mainly 307 classified by the characteristics of their flowers and fruits [5,41,42]. A. sinensis was chiefly 308 identified by a moderate calyx that did not wrap the fruit, smooth seed coat without yellow 309 pubescence, and long seed appendages. A. yunnanensis features oval fruit, a smaller and scattered 310 calyx, and seeds densely coated by pubescence [6]. A. malaccensis has round fruit and a small calyx 311 that degrades after the fruit ripens, and A. crassna features oval or relatively round fruit, a larger 312 fruit and calyx, with the fruit usually wrapped in the calyx, and thick and leathery leaves [43]. 313 Therefore, the source species of the Qi-Nan germplasm was inferred to be A. sinensis based on plant 314

morphology. 315
Whether in the single regions or multilocus combinations, the intraspecific distance among the 58 316 Qi-Nan germplasms (GG) was 0, and the interspecific distance between the Qi-Nan germplasm and 317 A. sinensis (GS) was the smallest (Figs 3 and 4). This finding indicates that different types of Qi-318 Nan germplasm significantly differ in plant morphology and agarwood quality. However, the 58 Qi-319 Nan germplasms selected did not significantly differ in molecular identification, and all were most 320 closely related to A. sinensis. Genetic distances can also reflect the relationship between different 321 species and germplasms. For example, Zheng  Nan germplasm, the Qi-Nan germplasm currently cultivated in Guangdong was obtained by grafting 337 (Fig 1). First, the branches of wild Aquilaria trees were grafted onto A. sinensis; after maturity the 338 germplasm was propagated by grafting branches onto cultivated trees. The scion was mainly wild 339 A. sinensis from Huizhou, Dianbai, Shenzhen, Hong Kong, and Hainan Province in China. 340 According to the geographical distribution of Aquilaria species and the results of the phylogenetic 341 tree, the source species of Qi-Nan germplasm cultivated in China is A. sinensis. However, DNA 342 barcoding still has certain limitations and failed to resolve differences among Qi-Nan germplasm 343 resources. Our group is currently attempting to carry out a thorough study using inter simple 344 sequence repeats (ISSR) and random amplified polymorphic DNA (RAPD) molecular markers as 345 well as other techniques. 346 The Relationship between "Qi-Nan" Agarwood and Qi-Nan 347 Germplasm 348 "Qi-Nan" agarwood has different names in different countries and regions, including Chinese names 349 (e.g., Qinan, Jianan, and Jialuo) and English names (e.g., Qi-Nan, Kanankoh, Kyara and Chi-Nan) 350 [47]. Historical records reported "Qi-Nan" agarwood as an Aquilaria species in the traditional sense, 351 referring to top-grade agarwood formed under extremely demanding conditions that was rich in 352 resin, elegant in fragrance and dark in color [48]. It was named for its mysterious scent that could 353 be achieved without burning the wood and was distinguished from other types of agarwood as the 354 most expensive and top-quality due to its unique smell and appearance [49]. "Qi-Nan" agarwood is 355 further divided according to appearance and color into green Qi-Nan, purple Qi-Nan, black Qi-Nan, 356 yellow Qi-Nan, etc. [48,50]. At present, the market price of "Qi-Nan" agarwood has far exceeded 357 that of general agarwood. 358 Given its ultrahigh economic and collection value of "Qi-Nan" agarwood, cultivation of Qi-359 Nan germplasm has received substantial publicity in Guangdong, China in recent years, due to 360 claims that planting the germplasm would produce "Qi-Nan" agarwood with the best resin and 361 fragrance. However, this research focused on the source species of Qi-Nan germplasm instead of 362 the relationship between the produced agarwood and the traditional "Qi-Nan" agarwood, which 363 remains unclear and unconfirmed. This problem is also key to safeguarding the stability of the 364 agarwood market. "Qi-Nan" agarwood merits further analysis, including at the chemical, 365 microscopic, and molecular levels. This paper is the first to use DNA barcoding to identify Qi-Nan 366 germplasm cultivated in China and report that it originated from A. sinensis in China. These findings 367 may inform the future promotion and application of agarwood produced from Qi-Nan germplasm.   The data that support the findings of this study are openly available in the NCBI GenBank database at https://www.ncbi.nlm.nih.gov; the reference numbers OM908943-OM909007 and OM938993-401 OM939187 are shown in S2 Table.  402  403